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Abstract 

A general framework for solving image inverse problems is introduced in this paper. The approach is 
based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual 
mathematical interpretation of the proposed framework with structured sparse estimation is described, which 
shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional 
sparse inverse problem techniques. This interpretation also suggests an effective dictionary motivated initial- 
ization for the MAP-EM algorithm. We demonstrate that in a number of image inverse problems, including 
inpainting, zooming, and deblurring, the same algorithm produces either equal, often significantly better, or 
very small margin worse results than the best published ones, at a lower computational cost. 
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Image restoration often requires to solve an inverse problem. It amounts to estimate an image f from a 



\& ' measurement 

o . 

y = Uf+w, 

obtained through a non-invertible linear degradation operator U, and contaminated by an additive noise 
w. Typical degradation operators include masking, subsampling in a uniform grid and convolution, the 
corresponding inverse problems often named inpainting or interpolation, zooming and deblurring. Estimating 
f requires some prior information on the image, or equivalently image models. Finding good image models 
is therefore at the heart of image estimation. 

Mixture models are often used as image priors since they enjoy the flexibility of signal description by as- 
suming that the signals are generated by a mixture of probability distributions [49]. Gaussian mixture models 
(GMM) have been shown to provide powerful tools for data classification and segmentation applications (see 
for example fffl . [30ll , [54ll , [58]), however, they have not yet been shown to generate state-of-the-art in a 
general class of inverse problems. Ghahramani and Jordan have applied GMM for learning from incomplete 
data, i.e., images degraded by a masking operator, and have shown good classification results, however, it 
does not lead to state-of-the-art inpainting [31]. Portilla et al. have shown image denoising impressive results 
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by assuming Gaussian scale mixture models (deviating from GMM by assuming different scale factors in 
the mixture of Gaussians) on wavelet representations 11551 , and have recently extended its applications on 
image deblurring [32]. Recently, Zhou et al. have developed an nonparametric Bayesian approach using 
more elaborated models, such as beta and Dirichlet processes, which leads to excellent results in denoising 
and inpainting ifTTTl . 

The now popular sparse signal models, on the other hand, assume that the signals can be accurately 
represented with a few coefficients selecting atoms in some dictionary [45 ]. Recently, very impressive image 
restoration results have been obtained with local patch-based sparse representations calculated with dictio- 
naries learned from natural images |2), ll23l . P31 . Relative to pre-fixed dictionaries such as wavelets [45], 
curvelets IfTTTl . and bandlets ll46l . learned dictionaries enjoy the advantage of being better adapted to the 
images, thereby enhancing the sparsity. However, dictionary learning is a large-scale and highly non-convex 
problem. It requires high computational complexity, and its mathematical behavior is not yet well understood. 
In the dictionaries aforementioned, the actual sparse image representation is calculated with relatively 
expensive non-linear estimations, such as l\ or matching pursuits [19], C2l . |48l . More importantly, as 
will be reviewed in Section IIII-AI with a full degree of freedom in selecting the approximation space (atoms 
of the dictionary), non-linear sparse inverse problem estimation may be unstable and imprecise due to the 
coherence of the dictionary ||47l . 

Structured sparse image representation models further regularize the sparse estimation by assuming de- 
pendency on the selection of the active atoms. One simultaneously selects blocks of approximation atoms, 
thereby reducing the number of possible approximation spaces 0, |25l , l|26l , 11331 , f36l , 1591 . These 
structured approximations have been shown to improve the signal estimation in a compressive sensing context 
for a random operator U. However, for more unstable inverse problems such as zooming or deblurring, 
this regularization by itself is not sufficient to reach state-of-the-art results. Recently some good image 
zooming results have been obtained with structured sparsity based on directional block structures in wavelet 
representations |47l . However, this directional regularization is not general enough to be extended to solve 
other inverse problems. 

This work shows that the Gaussian mixture models (GMM), estimated via an MAP-EM (maximum a 
posteriori expectation-maximization) algorithm, lead to results in the same ballpark as the state-of-the- 
art in a number of imaging inverse problems, at a lower computational cost. The MAP-EM algorithm is 
described in Section|IIl After briefly reviewing sparse inverse problem estimation approaches, a mathematical 
equivalence between the proposed piecewise linear estimation (PLE) from GMM/MAP-EM and structured 
sparse estimation is shown in Section [III] This connection shows that PLE stabilizes the sparse estimation 
with a structured learned overcomplete dictionary composed of a union of PCA (Principal Component 
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Analysis) bases, and with collaborative prior information incorporated in the eigenvalues, that privileges in 
the estimation the atoms that are more likely to be important. This interpretation suggests also an effective 
dictionary motivated initialization for the MAP-EM algorithm. In Section [TV] we support the importance of 
different components of the proposed PLE via some initial experiments. Applications of the proposed PLE 
in image inpainting, zooming, and deblurring are presented in sections [VJ [VTJ and IVIII respectively, and are 
compared with previous state-of-the-art methods. Conclusions are drawn in Section IVIII I 

II. Piecewise Linear Estimation 

This section describes the Gaussian mixture models (GMM) and the MAP-EM algorithm, which lead to 
the proposed piecewise linear estimation (PLE). 

A. Gaussian Mixture Model 

Natural images include rich and non-stationary content, whereas when restricted to local windows, image 
structures appear to be simpler and are therefore easier to model. Following some previous works ||2), ifTOl . 
Il43l , an image is decomposed into overlapping y/N x \/N local patches 



where U, is the degradation operator restricted to the patch i, y,, f, and w, are respectively the degraded, 
original image patches and the noise restricted to the patch, with 1 < i < I, I being the total number of 
patches. Treated as a signal, each of the patches is estimated, and their corresponding estimates are finally 
combined and averaged, leading to the estimate of the image. 

GMM describes local image patches with a mixture of Gaussian distributions. Assume there exist K 
Gaussian distributions {^yV {Hk^k)} i<k<K parametrized by their means ju* and covariances Z^. Each image 
patch f, is independently drawn from one of these Gaussians with an unknown index k, whose probability 
density function is 



Estimating {f;}i<;</ from {y,}i</</ can then be casted into the following problems: 

• Estimate the Gaussian parameters {(Hk,^k)}i<k<K, from the degraded data {ji}i<i<i- 

• Identify the Gaussian distribution kj that generates the patch i, VI < i < I. 

• Estimate f, from its corresponding Gaussian distribution VI <i<I. 

These problems are overall non-convex. The next section will present a maximum a posteriori expectation- 
maximization (MAP-EM) algorithm that calculates a local-minimum solution Q. 
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B. MAP-EM Algorithm 

Following an initialization, addressed in Section IIII-CI the MAP-EM algorithm is an iterative procedure 
that alternates between two steps: 

• In the E-step, assuming that the estimates of the Gaussian parameters {(p.k,^k)}i<k<K are known 
(following the previous M-step), for each patch one calculates the maximum a posteriori (MAP) 
estimates with all the Gaussian models, and selects the best Gaussian model k\ to obtain the estimate 
of the patch f, = fv' . 

• In the M-step, assuming that the Gaussian model selection k, and the signal estimate f,, V/, are known 
(following the previous E-step), one estimates (updates) the Gaussian models {(/i/ti^yt)}i<yt<*:- 

1 ) E-step: Signal Estimation and Model Selection: In the E-step, the estimates of the Gaussian parameters 
{(fLk^k)}i<k<K are assumed to be known. To simplify the notation, we assume without loss of generality 
that the Gaussians have zero means fit = 0, as one can always center the image patches with respect to the 
means. 

For each image patch i, the signal estimation and model selection is calculated to maximize the log 
a-posteriori probability log p(ti\ji, t<k)- 

(f h ki) = arg max log p{ij | y h t k ) = argmax (log p(ji\fi,t k ) +log/?(f,|E fc )) 

= argmin(||Uif i --y ( || 2 + a 2 ff^ 1 f ( + a 2 log|l yt |), (3) 

where the second equality follows the Bayes rule and the third one is derived with the assumption that 
w, ~ ^V(0,o 2 Id), with Id the identity matrix, and f, ~ ^(O,^)- 

The maximization is first calculated over f, and then over k. Given a Gaussian signal model f, ~ ,JV ^0,1^), 
it is well known that the MAP estimate 

ff = argmin (H^f; - y ; || 2 + a 2 ff £^f«) (4) 

minimizes the risk £[||ff — f/|| 2 ] ll45l . One can verify that the solution to (0]) can be calculated with a linear 
filtering 

? = W* s yf, (5) 

where 

W )t ,. = (UfU, + (J%- 1 )- 1 Uf (6) 

is a Wiener filter matrix. Since Uf U; is semi -positive definite, Uj U,- + <7 2 S^ 1 is positive definite and its 
inverse is well defined, if is full rank. 

The best Gaussian model that generates the maximum MAP probability among all the models is then 
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selected with the estimated f* 

*, = argmin (||U,-ff - y|| 2 + C7 2 (f? ) r E^ff + C7 2 log |E*|) . (7) 
The signal estimate is obtained by plugging in the best model k, in the MAP estimate dl} 

f«=f?. (8) 

The whole E-step is basically calculated with a set of linear filters. For typical applications such as 
zooming and deblurring where the degradation operators U; are translation-invariant and do not depend on 
the patch index i, i.e., U, = U, the Wiener filter matrices = W,t © can be precomputed for the K 
Gaussian distributions. Calculating ((5]) thus requires only 2N 2 floating-point operations (flops), where N is 
the image patch size. For a translation-variant degradation U,-, random masking for example, W,t ; needs to be 
calculated at each position where U; changes. Since Uf U,- + cr 2 £^ 1 is positive definite, the matrix inversion 
can be implemented with N 3 /3 + 2N 2 « jV 3 /3 flops through a Cholesky factorization |9 j. All this makes the 
E-step computationally efficient. 

2) M-step: Model Estimation: In the M-step, the Gaussian model selection and the signal estimate f, 
of all the patches are assumed to be known. Let ^ be the ensemble of the patch indices i that are assigned 
to the k-th Gaussian model, i.e., ^ = {i\ki = k}, and let |^| be its cardinality. The parameters of each 
Gaussian model are estimated with the maximum likelihood estimate using all the patches assigned to that 
Gaussian cluster, 

(flk,£k) = argmaxlogp({f i } i - e ^ (9) 
With the Gaussian model © , one can easily verify that the resulting estimate is the empirical estimate 

A* = w\ I ~ fi and ^ k = wilL (ii-h)(ii-h) T - do) 

The empirical covariance estimate may be improved through regularization when there is lack of data B71 
(for typical patch size 8x8, the dimension of the covariance matrix Z,t is 64 x 64, while the |^| is typically 
in the order of a few hundred). A simple and standard eigenvalue-based regularization is used here, 
Zjt + eld, where £ is a small constant. The regularization also guarantees that the estimate of the covariance 
matrix is full-rank, so that the Wiener filter ((6]) is always well defined. This is important for the Gaussian 
model selection © as well, since if is not full rank, then log | | — > — °°, biasing the model selection. 
The computational complexity of the M-step is negligible with respect to the E-step. 

As the MAP-EM algorithm described above iterates, the MAP probability of the observed signals 
p({fi}i<i<i\{yi}i<i<i,{h^k}i<k<K) always increases. This can be observed by interpreting the E- and M- 
steps as a coordinate descent optimization [34]. In the experiments, the convergence of the patch clustering 



June 16, 2010 



DRAFT 



6 



and resulting PSNR is always observed. 



III. PLE and Structured Sparse Estimation 



The MAP-EM algorithm described above requires an initialization. A good initialization is highly important 
for iterative algorithms that try to solve non-convex problems, and remains an active research topic |5], [29]. 
This section describes a dual structured sparse interpretation of GMM and MAP-EM, which suggests an 
effective dictionary motivated initialization for the MAP-EM algorithm. Moreover, it shows that the resulting 
piecewise linear estimate stabilizes traditional sparse inverse problem estimation. 

The sparse inverse problem estimation approaches will be first reviewed. After describing the connection 
between MAP-EM and structured sparsity via estimation in PCA bases, an intuitive and effective initialization 
will be presented. 

A. Sparse Inverse Problem Estimation 

Traditional sparse super-resolution estimation in dictionaries provides effective non-parametric approaches 
to inverse problems, although the coherence of the dictionary and their large degree of freedom may become 
sources of instability and errors. These algorithms are briefly reviewed in this section. "Super-resolution" is 
loosely used here as these approaches try to recover information that is lost after the degradation. 

A signal f E M. N is estimated by taking advantage of prior information which specifies a dictionary D G 
R A ' x l r l, having |T| columns corresponding to atoms {0 m } me r> where f has a sparse approximation. This 
dictionary may be a basis or some redundant frame, with |T| > N. Sparsity means that f is well approximated 
by its orthogonal projection fA over a subspace Va generated by a small number |A| <C |T| of column vectors 
{<Pm}meA of D: 



where a S Rl r l is the transform coefficient vector, a • 1a selects the coefficients in A and sets the others to 
zero, D(a-lA) multiplies the matrix D with the vector a 1a, and ||£a|| 2 ^ ||f|| 2 is a small approximation 
error. 

Sparse inversion algorithms try to estimate from the degraded signal y = Uf + w the support A and the 
coefficients a in A that specify the projection of f in the approximation space Va- It results from (fTTT> that 



This means that y is well approximated by the same sparse set A of atoms and the same coefficients a in 
the transformed dictionary UD, whose columns are the transformed vectors {U<p m }mer- 

Since U is not an invertible operator, the transformed dictionary UD is redundant, with column vectors 
which are linearly dependent. It results that y has an infinite number of possible decompositions in UD. 



f=f A + £A = D(a-l A ) + £A, 



(11) 



y = UD(a-l A ) + e', with e' = U£ + w. 



(12) 
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A sparse approximation y = UDa of y can be calculated with a basis pursuit algorithm which minimizes a 
Lagrangian penalized by a sparse h norm [16], [61] 

a = argmin||UDa-y|| 2 + A llalli, (13) 

a 

or with faster greedy matching pursuit algorithms l48l . The resulting sparse estimation of f is 

f=Da. (14) 

As we explain next, this simple approach is not straightforward and often not as effective as it seems. The 
Restrictive Isometry Property of Candes and Tao |[T2l and Donoho ET1 is a strong sufficient condition which 
guarantees the correctness of the penalized l\ estimation. This restrictive isometry property is valid for certain 
classes of operators U, but not for important structured operators such as subsampling on a uniform grid or 
convolution. For structured operators, the precision and stability of this sparse inverse estimation depends 
upon the "geometry" of the approximation support A of f, which is not well understood mathematically, 
despite some sufficient exact recovery conditions proved for example by Tropp [62], and many others (mostly 
related to the coherence of the equivalent dictionary). Nevertheless, some necessary qualitative conditions 
for a precise and stable sparse super-resolution estimate (fT4l) can be deduced as follows l45l . B71 : 

• Sparsity. D provides a sparse representation for f. 

• Recoverability. The atoms have non negligible norms ||U0 m || 2 3> 0. If the degradation operator U 
applied to <j) m leaves no "trace," the corresponding coefficient a[ni] can not be recovered from y with ( [TBI . 
We will see in the next subsection that this recoverability property of transformed relevant atoms having 
sufficient energy is critical for the GMM/MAP-EM introduced in the previous section as well. 

• Stability. The transformed dictionary UD is incoherent enough. Sparse inverse problem estimation may 
be unstable if some columns {U<p m }mer in UD are too similar. To see this, let us imagine a toy example, 
where a constant-value atom and a highly oscillatory atom (with values —1,1,-1,1,...), after a x2 
subsampling, become identical. The sparse estimation (TT3l can not distinguish between them, which 
results in an unstable inverse problem estimate ([T4l . The coherence of UD depends on D as well as on 
the operator U. Regular operators U such as subsampling on a uniform grid and convolution, usually 
lead to a coherent UD, which makes accurate inverse problem estimation difficult. 

Several authors have applied this sparse super-resolution framework (fT"3l and ([141 for image inverse 
problems. Sparse estimation in dictionaries of curvelet frames and DCT have been applied successfully 
to image inpainting ll24l . ll27l . (33). However, for uniform grid interpolations, Section [VI] shows that the 
resulting interpolation estimations are not as precise as simple linear bicubic interpolations. A contourlet 
zooming algorithm ||5T| can provide a slightly better PSNR than a bicubic interpolation, but the results are 
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considerably below the state-of-the-art. Learned dictionaries of image patches have generated good inpainting 
results ||43l , fTTI . In some recent works sparse super-resolution algorithms with learned dictionary have been 
studied for zooming and deblurring ATI . |[65l . As shown in sections IVTl and [VTTl although they sometimes 
produce good visual quality, they often generate artifacts and the resulting PSNRs are not as good as more 
standard methods. 

Another source of instability of these algorithms comes from their full degree of freedom. The non-linear 
approximation space Va is estimated by selecting the approximation support A, with basically no constraint. 
A selection of |A| atoms from a dictionary of size |T| thus corresponds to a choice of an approximation space 
among (j^j) possible subspaces. In a local patch-based sparse estimation with 8x8 patch size, typical values 
of |r| = 256 and |A| = 8 lead to a huge degree of freedom ( 2 g 6 ) ~ 10 14 , further stressing the inaccuracy of 
estimating a from an UD. 

These issues are addressed with the proposed PLE framework and its mathematical connection with 
structured sparse models described next. 

B. Structured Sparse Estimation in PCA bases 

The PCA bases bridge the GMM/MAP-EM framework presented in Section [II] with the sparse estimation 
described above. For signals {f,} following a statistical distribution, a PCA basis is defined as the matrix 
that diagonalizes the data covariance matrix = E[fiff], 



where is the PCA basis and = diag(Af, . . . , X^) is a diagonal matrix, whose diagonal elements X* > 
X\ > . . . > X^ are the sorted eigenvalues. It can be shown that the PCA basis is orthonormal, i.e., BytB^ = Id, 
each of its columns 0™, 1 < m < N, being an atom that represents one principal direction. The eigenvalues 
are non-negative, X,„ > 0, and measure the energy of the signals {f,} in each of the principal directions [45]. 

Transforming f, from the canonical basis to the PCA basis a* = B^f,-, one can verify that the MAP 
estimate ((U)-© can be equivalently calculated as 



where, following simple algebra and calculus, the MAP estimate of the PCA coefficients af is obtained by 



Comparing (fTTT ) with ( fl3T ), the MAP-EM estimation can thus be interpreted as a structured sparse es- 
timation. As illustrated in Figure [T] the proposed dictionary has the advantage of the traditional learned 
overcomplete dictionaries being overcomplete, and adapted to the image under test thanks to the Gaussian 



— B^Bj. , 



(15) 




(16) 




(17) 
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Fig. 1. Left: Traditional overcomplete dictionary. Each column represents an atom in the dictionary. Non-linear 
estimation has the full degree of freedom to select any combination of atoms (marked by the columns in red). Right: 
The underlying structured sparse piecewise linear dictionary of the proposed approach. The dictionary is composed of 
a family of PCA bases whose atoms are pre-ordered by their associated eigenvalues. For each image patch, an optimal 
linear estimator is calculated in each PCA basis and the best linear estimate among the bases is selected (marked by 
the basis in red). 

model estimation in the M-step (which is equivalent to updating the PCAs), but the resulting piecewise 
linear estimator (PLE) is more structured than the traditional nonlinear sparse estimation. PLE is calculated 
with a linear estimation in each basis and a non-linear best basis selection: 

• Nonlinear block sparsity. The dictionary is composed of a union of K PCA bases. To represent an 
image patch, the non-linear model selection ([3]) in the E-step restricts the estimation to only one basis 
(N atoms out of KN selected in group), and has a degree of freedom equal to K, sharply reduced from 
that in the traditional sparse estimation which has the full degree of freedom in atom selection. 

• Linear collaborative filtering. Inside each PCA basis, the atoms are pre-ordered by their associated 
eigenvalues (which decay very fast as we will later see, leading to sparsity inside the block as well). In 
contrast to the non-linear sparse l\ estimation ( fT3l , the MAP estimate ( fT71 ) implements the regularization 
with the h norm of the coefficients weighted by the eigenvalues {A^}i< m <A?, and is calculated with a 
linear filtering (0 ©. The eigenvalues are computed from all the signals {f,} in the same Gaussian 
distribution class. The resulting estimation therefore implements a collaborative filtering which incorpo- 
rates the information from all the signals in the same cluster HI. The weighting scheme privileges the 
coefficients a,[m] corresponding to the principal directions with large eigenvalues X m , where the energy 
is likely to be high, and penalizes the others. For the ill-posed inverse problems, the collaborative prior 
information incorporated in the eigenvalues {A*,}i< m <Ar further stabilizes the estimate. (Note that this 
collaborative weighting is fundamentally different than the standard one used in iterative weighted I2 
approaches to sparse coding [20].) 

As described in Section JIJ the complexity of the MAP-EM algorithm is dominated by the E-step. For an 
image patch size of y/N x (typical value 8 x 8), it costs 2KN 2 flops for translation-invariant degradation 
operators such as uniform subsampling and convolution, and KN 3 /3 flops for translation- variant operators 
such as random masking, where K is the number of PCA bases. The overall complexity is therefore tightly 
upper bounded by &(2LKN 2 ) or &(LKN 3 /3), where L is the number of iterations. As will be shown in 
Section |IVJ the algorithm converges fast for image inverse problems, typically in L = 3 to 5 iterations. On 
the other hand, the complexity of the l\ minimization with the same dictionary is &(KN 3 ), with typically 
a large factor in front as the l\ converges slowly in practice. The MAP-EM algorithm is thus typically one 
or two orders of magnitude faster than the sparse estimation. 
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To conclude, let as come back to the recoverability property mentioned in the previous section. We see 
from ( fTTT ) that if an eigenvector of the covariance matrix is killed by the operator Uj, then its contribution to 
the recovery of y, is virtually null, while it pays a price proportional to the corresponding eigenvalue. Then, 
it will not be used in the optimization (fTTT ). and thereby in the reconstruction of the signal following (fT6l ). 
This means that the wrong model might be selected and an improper reconstruction obtained. This further 
stresses the importance of a correct design of dictionary elements, which from the description just presented, 
it is equivalent to the correct design of the covariance matrix, including the initialization, which is described 
next. 

C. Initialization of MAP-EM 

The PC A formulation just described not only reveals the connection between the MAP-EM estimator and 
structured sparse modeling, but it is crucial for understanding how to initialize the Gaussian models as well. 

1 ) Sparsity: As explained in Section IIII-Al for the sparse inverse problem estimation model to have the 
super-resolution ability, the first requirement on the dictionary is to be able to provide sparse representations 
of the image. It has been shown that capturing image directional regularity is highly important for sparse 
representations O, ifTTIl . |46ll . In dictionary learning, for example, most prominent atoms look like local edges 
good at representing contours, as illustrated in Figure |2]-(a). Therefore the initial PCAs in our framework, 
which following ( fl5l ) will lead to the initial Gaussians, are designed to capture image directional regularity. 




(a) (b) (c) (d) 



Fig. 2. (a) Some typical dictionary atoms learned from the image Lena (Figure [3]-(a)) with K-SVD [2]. (b)-(d) A 
numerical procedure to obtain the initial directional PCAs. (b) A synthetic edge image. Patches (8 x 8) that touch the 
edge are used to calculate an initial PCA basis, (c) The first 8 atoms in the PCA basis with the largest eigenvalues, 
(d) Typical eigenvalues. 

The initial directional PCA bases are calculated following a simple numerical procedure. Directions from 
to % are uniformly sampled to K angles, and one PCA basis is calculated per angle. The calculation 
of the PCA at an angle 6 uses a synthetic blank-and-white edge image following the same direction, as 
illustrated in Figure |2]-(b). Local patches that touch the contour are collected and are used to calculate the 
PCA basis (following (fTOl) and (fT5T)). The first atom, which is almost DC, is replaced by DC, and a Gram- 
Schmidt orthogonalization is calculated on the other atoms to ensure the orthogonality of the basis. The 
patches contain edges that are translation-invariant. As the covariance of a stationary process is diagonalized 
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by the Fourier basis, unsurprisingly, the resulting PCA basis has first few important atoms similar to the 
cosines atoms oscillating in the direction 6 from low-frequency to high-frequency, as shown in Figure |2]-(c). 
Comparing with the Fourier vectors, these PCAs enjoy the advantage of being free of the periodic boundary 
issue, so that they can provide sparse representations for local image patches. The eigenvalues of all the bases 
are initiated with the same ones obtained from the synthetic contour image, that have fast decay, Figure |2]-(d). 
These, following (031 ). complete the covariance initialization. The Gaussian means are initialized with zeros. 

It is worth noting that this directional PCA basis not only provides sparse representations for contours 
and edges, but it captures well textures of the same directionality as well. Indeed, in a space of dimension 
N corresponding to patches of size yN x \/N, the first about yN atoms illustrated in Figure [2]-(c) absorb 
most of the energy in local patterns following the same direction in real images, as indicated by the fast 
decay of the eigenvalues (very similar to Figure l2-(d)). 

A typical patch size is y/N x y/N = 8 x 8, as selected in previous works Q, [23]. The number of directions 
in a local patch is limited due to the pixelization. The DCT basis is also included in competition with the 
directional bases to capture isotropic image patterns. Our experiments have shown that in image inverse 
problems, there is a significant average gain in PSNR when K grows from to 3 (when K = 0, the dictionary 
is initialized with only a DCT basis and all the patches are assigned to the same cluster), which shows that 
one Gaussian model, or equivalently a single linear estimator, is not enough to accurately describe the 
image. When K increases, the gain reduces and gets stabilized at about K = 36. Compromising between 
performance and complexity, K = 18, which corresponds to a 10° angle sampling step, is selected in all the 
future experiments. 

Figures [3]-(a) and (b) illustrates the Lena image and the corresponding patch clustering, i.e., the model 
selection ki, obtained for the above initialization, calculated with (O in the E-step described in Section JI] 
The patches are densely overlapped and each pixel in Figure (3]-(b) represents the model kj selected for the 
8x8 patch around it, different colors encoding different values of kj, from 1 to 19 (18 directions plus a 
DCT). One can observe, for example on the edges of the hat, that patches where the image patterns follow 
similar directions are clustered together, as expected. Let us note that on the uniform regions such as the 
background, where there is no directional preference, all the bases provide equally sparse representations. 
As the log|Lfc| = n^ =1 A^, term in the model selection © is initialized as identical for all the Gaussian 
models, the clustering is random is these regions. The clustering will improve as the MAP-EM progresses. 

2 ) Recover ability: The oscillatory atoms illustrated in Figure [2]-(c) are spread in space. Therefore, for 
diagonal operators in space such as masking and subsampling, they satisfy well the recoverability condition 
||U^|| 2 > for super-resolution described in Section ITlI-AI However, as these oscillatory atoms have Dirac 
supports in Fourier, for convolution operators, the recoverability condition is violated. For convolution 
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Fig. 3. (a). Lena image, ((b) to (d) are color images.) (b). Patch clustering obtained with the initial directional PCAs 
(see Figure |2}(c)). The patches are densely overlapped and each pixel represents the model kj selected for the 8x8 
patch around it, different colors encoding different direction values of from 1 to K = 19. (c). Patch clustering 
obtained with the initial position PCAs (see Figure 2]). Different colors encoding different position values of from 
1 to P = 12. (d) and (e). Patch clustering with respectively directional and position PCAs after the 2nd iteration. 



operators U, ||U<^|| 2 >0 requires that the atoms have spread Fourier spectrum, and therefore be localized 
in space. Following a similar numerical scheme as described above, patches touching the edge at a fixed 
position are extracted from synthetic edge images with different amounts of blur. The resulting PCA basis, 
named position PCA basis hereafter, contains localized atoms of different polarities and at different scales, 
following the same direction 6 , as illustrated in Figure |4] (which look like wavelets along the appropriate 
direction). For each direction 6, a family of localized PCA bases {Rk,p}i<p<p w& calculated at all the 
positions translating within the patch. The eigenvalues are initialized with the same fast decay ones as 
illustrated in Figure [2]-(d) for all the position PCA bases. Each pixel in Figure [3]-(c) represents the model 
Pi selected for the 8x8 patch around it, different colors encoding different position values of p t , from 1 to 
12. The rainbow-like color transitions on the edges show that the position bases are accurately fitted to the 
image structures. 




Fig. 4. The first 8 atoms in the position PCA basis with the largest eigenvalues. 

3) Wiener Filtering Interpretation: Figure [5] illustrates some typical Wiener filters, which are the rows 
of Wjt in ([6]), calculated with the initial PCA bases described above for zooming and deblurring. The filters 
have intuitive interpretations, for example directional interpolator for zooming and directional deconvolution 
for deblurring, confirming the effectiveness of the initialization. 

D. Additional comments on related works 

Before proceeding with experimental results and applications, let us further comment on some related 
works, in addition to those already addressed in Section U 

The MAP-EM algorithm using various probability distributions such as Gaussian, Laplacian, Gamma 
and Gibbs have been widely applied in medical image reconstruction and analysis (see for example flTOl , 
[40]). Following the Gaussian mixture models, MAP-EM alternates between image patch estimation and 
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(a) (b) (c) (d) 

Fig. 5. Some filters generated by the MAP estimator, (a) and (b) are for image zooming, where the degradation operator 
U is a 2 x 2 subsampling operator. Gray-level from white to black: values from negative to positive, (a) is computed 
with a Gaussian distribution whose PCA basis is a DCT basis, and it implements an isotropic interpolator, (b) is 
computed with a Gaussian distribution whose PCA basis is a directional PCA basis (angle 9 — 30°), and it implements 
a directional interpolator, (c) and (d) are shown in Fourier and are for image deblurring, where the degradation operator 
U is a Gaussian convolution operator. Gray-level from white to black: Fourier modules from zero to positive, (c) is 
computed with a Gaussian distribution whose PCA basis is a DCT basis, and it implements an isotropic deblurring 
filter, (d) is computed with a Gaussian distribution whose PCA basis is a directional PCA basis (angle 9 — 30°, at a 
fixed position), and it implements a directional deblurring filter. 

clustering, and Gaussian models estimation. Clustering-based estimation has been shown effective for image 
restoration. To achieve accurate clustering-based estimation, an appropriate clustering is at the heart. In 
a denoising setting where images are noisy but not degraded by the linear operator U, clustering with 
block matching, i.e., calculating Euclidian distance between image patch gray-levels ifTOl , ifTTl , [42], and 
with image segmentation algorithms such as k-means on local image features Ifl51l . have been shown to 
improve the denoising results. For inverse problems where the observed images are degraded, for example 
images with holes in an inpainting setting, clustering becomes more difficult. The generalized PCA [64] 
models and segments data using an algebraic subspace clustering technique based on polynomial fitting and 
differentiation, and while it has been shown effective in image segmentation, it does not reach state-of-the- 
art in image restoration. In the recent non-parametric Bayesian approach fTTl . an image patch clustering is 
implemented with probability models, which improves the denoising and inpainting results, although still 
under performing, in quality and computational cost, the framework here introduced. The clustering in the 
MAP-EM procedure enjoys the advantage of being completely consistent with the signal estimation, and in 
consequence leads to state-of-the-art results in a number of imaging inverse problem applications. 

IV. Initial Supportive Experiments 

Before proceeding with detailed experimental results for a number of applications of the proposed frame- 
work, this section shows through some basic experiments the effectiveness and importance of the initialization 
proposed above, the evolution of the representations as the MAP-EM algorithm iterates, as well as the 
improvement brought by the structure in PLE with respect to traditional sparse estimation. 

Following some recent works, e.g., H4l . an image is decomposed into 128 x 128 regions, each region 
treated with the MAP-EM algorithm separately. The idea is that image contents are often more coherent 
semi-locally than globally, and Gaussian model estimation or dictionary learning can be slightly improved 
in semi-local regions. This also saves memory and enables the processing to proceed as the image is being 
transmitted. Parallel processing on image regions is also possible when the whole image is available. Regions 
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are half-overlapped to eliminate the boundary effect between the regions, and their estimates are averaged 
at the end to obtain the final estimate. 

A. Initialization 

Different initializations are compared in the context of different inverse problems, inpainting, zooming 
and deblurring. The reported experiments are performed on some typical image regions, Lena's hat with 
sharp contours and Barbara's cloth rich in texture, as illustrated in Figure [6] 

Inpainting. In the addressed case of inpainting, the image is degraded by U, that is a random masking 
operator which randomly sets pixel values to zeros. The initialization described above is compared with a 
random initialization, which initializes in the E-step all the missing pixel value with zeros and starts with 
a random patch clustering. Figure |6]-(a) and (b) compare the PSNRs obtained by the MAP-EM algorithm 
with those two initializations. The algorithm with the random initialization converges to a PSNR close to, 
about 0.4 dB lower than, that with the proposed initialization, and the convergence takes much longer time 
(about 6 iterations) than the latter (about 3 iterations). 

It is worth noting that on the contours of Lena's hat, with the proposed initialization the resulting PSNR 
is stable from the initialization, which already produces accurate estimation, since the initial directional PCA 
bases themselves are calculated over synthetic contour images, as described in Section IIII-CI 
Zooming. In the context of zooming, the degradation U is a subsampling operator on a uniform grid, much 
structured than that for inpainting. The MAP-EM algorithm with the random initialization completely fails 
to work: It gets stuck in the initialization and does not lead to any changes on the degraded image. Instead 
of initializing the missing pixels with zeros, a bicubic initialization is tested, which initializes the missing 
pixels with bicubic interpolation. Figure [6]-(c) shows that, as the MAP-EM algorithm iterates, it significantly 
improves the PSNR over the bicubic initialization, however, the PSNR after a slower convergence is still 
about 0.5 dB lower than that obtained with the proposed initialization. 

Deblurring. In the deblurring setting, the degradation U is a convolution operator, which is very structured, 
and the image is further contaminated with a white Gaussian noise. Four initializations are under considera- 
tion: the initialization with directional PCAs (K directions plus a DCT basis), which is exactly the same as 
that for inpainting and zooming tasks, the proposed initialization with the position PCA bases for deblurring 
as described in Section ITII-C2I CP positions per each of the K directions, all with the same eigenvalues as for 
the directional PCAs initialization), and two random initializations with the blurred image itself as the initial 
estimate and a random patch clustering with, respectively, K+l and (K+l)P clusters. As illustrated in 
Figure [6]-(d), the algorithm with the directional PCAs initialization gets stuck in a local minimum since the 
second iteration, and converges to a PSNR 1.5 dB lower than that with the initialization using the position 
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PCAs. Indeed, since the recoverability condition for deblurring, as explained in Section IIII-C2L is violated 
with just directional PCA bases, the resulting images remain still quite blurred. The random initialization 
with (K+l)P clusters results in better results than with K+l clusters, which is 0.7 dB worse than the 
proposed initialization with position PCAs. 

These experiments confirm the importance of the initialization in the MAP-EM algorithm to solve inverse 
problems. The sparse coding dual interpretation of GMM/MAP-EM helps to deduce effective initializations 
for different inverse problems. While for inpainting with random masking operators, trivial initializations 
slowly converge to a solution moderately worse than that obtained with the proposed initialization, for more 
structured degradation operators such as uniform subsampling and convolution, simple initializations either 
completely fail to work or lead to significantly worse results than with the proposed initialization. 




No. ileration No. iteration No. Iteration No. Iteration 

(a) (b) (c) (d) 

Fig. 6. PSNR comparison of the MAP-EM algorithm with different initializations on different inverse problems. The 
horizontal axis corresponds to the number of iterations, (a) and (b). Inpainting with 50% and 30% available data, on 
Lena's hat and Barbara's cloth. The initializations under consideration are the random initialization and the initialization 
with directional PCA bases, (c) Zooming, on Lena's hat. The initializations under consideration are bicubic initialization 
and the initialization with directional PCA bases. (Random initialization completely fails to work.) (d) Deblurring, on 
Lena's hat. The initializations under consideration are the initialization with directional PCAs (K directions plus a 
DCT basis), the initialization with the position PCA bases (P positions per each of the K directions), and two random 
initializations with the blurred image itself as the initial estimate and a random patch clustering with, respectively, 
K+l (rand. 1) and (K+1)P (rand. 2) clusters. See text for more details. 

B. Evolution of Representations 

Figure|7]illustrates, in an inpainting context on Barbara's cloth, which is rich in texture, the evolution of the 
patch clustering as well as that of a typical PCA bases as the MAP-EM algorithm iterates. The clustering gets 
cleaned up as the algorithm iterates. (See figures [U-(d) and (e) for another example.) Some high-frequency 
atoms are promoted to better capture the oscillatory patterns, resulting in a significant PSNR improvement 
of more than 3 dB. On contour images such as Lena's hat illustrated in Figure [6l on the contrary, although 
the patch clustering is cleaned up as the algorithm iterates, the resulting local PSNR evolves little after the 
initialization, which already produces accurate estimation, since the directional PCA bases themselves are 
calculated over synthetic contour images, as described in Section IIII-CI The eigenvalues have always fast 
decay as the iteration goes on, visually similar to the plot in Figure [2j-(d). The resulting PSNRs typically 
converge in 3 to 5 iterations. 
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(a) (b) (c) (d) (e) 

Fig. 7. Evolution of the representations, (a) The original image cropped from Barbara, (b) The image masked with 
30% available data, (c) and (d) are color images, (c) Bottom: The first few atoms of an initial PCA basis corresponding 
to the texture on the right of the image. Top: The resulting patch clustering after the 1st iteration. Different colors 
represent different clusters, (d) Bottom: The first few atoms of the PCA basis updated after the 1st iteration. Top: The 
resulting patch clustering after the 2nd iteration, (e) The inpainting estimate after the 2nd iteration (32.30 dB). 



C. Estimation Methods 




(a) Original image. (b) Low-resolution image. (c) Global h ■ 22.70 dB (d) Global OMP: 28.24 dB 




(e) Block li: 26.35 dB (f) Block OMP: 29.27 dB (g) Block weighted h: 35.94 dB (h) Block weighted I 2 : 36.45 dB 



Fig. 8. Comparison of different estimation methods on super-resolution zooming, (a) The original image cropped 
from Lena, (b) The low-resolution image, shown at the same scale by pixel duplication. From (c) to (h) are the 
super-resolution results obtained with different estimation methods. See text for more details. 

From the sparse coding point of view, the gain of introducing structure in sparse inverse problem estimation 
as described in Section |III]is now shown through some experiments. An overcomplete dictionary D composed 
of a family of PCA bases {Bk}i<k<K> illustrated in Figure [D-(b), is learned as described in Section JIJ and 
is then fed to the following estimation schemes, (i) Global l\ and OMP: the ensemble of D is used as 
an overcomplete dictionary, and the zooming estimation is calculated with the sparse estimate (TT3T > through, 
respectively, an Zi minimization or an orthogonal matching pursuit (OMP). (ii) Block l\ and OMP: the 
sparse estimate is calculated in each PCA basis through, respectively an l\ minimization and an OMP, 
and the best estimate is selected with a model selection procedure similar to (0, thereby reducing the degree 
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of freedom in the estimation with respect to the global l\ and OMR ||6*71 . (iii) Block weighted l\\ on top 
of the block l\ , weights are included for each coefficient amplitude in the regularizer, 

a? = argmin(||U,B,a ; -y ( || 2 + a 2 £ , (18) 

with the weig hts x k m = (A*) 1 / 2 , where A* are the eigenvalues of the &-fh PCA basis. The weighting scheme 
penalizes the atoms that are less likely to be important, following the spirit of the weighted I2 deduced from 
the MAP estimate, (iv) Block weighted 1%: the proposed PLE. Comparing with ( fT8l ), the difference is that 
the weighted I2 (fT71 ) takes the place of the weighted l\, thereby transforming the problem into a stable and 
computationally efficient piecewise linear estimation. 

The comparison on a typical region of Lena in the 2x2 image zooming context is shown in Figure [8] 
The global h and OMP produce some clear artifacts along the contours, which degrade the PSNRs. The 
block l\ or OMP considerably improves the results (especially for Comparing with the block h or OMP, 
a very significant improvement is achieved by adding the collaborative weights on top of the block l\ . The 
proposed PLE with the block weighted I2, computed with linear filtering, further improves the estimation 
accuracy over the block weighted l\ , with a much lower computational cost. 

In the following sections, PLE will be applied to a number of inverse problems, including image inpainting, 
zooming and deblurring. The experiments are performed on some standard gray-level and color images, 
illustrated in Figure [9) 




Fig. 9. Images used in the numerical experiments. From top to bottom, left to right. The first eight are gray-level images: 
Lena (512 x 512), Barbara (512 x 512), Peppers (512 x 512), Mandril (512 x 512), House (256 x 256), Cameraman 
(256 x 256), Boats(512 x 512), and Straws (640 x 640). The rest are color images: Castle (481 x 321), Mushroom 
(481 x 321), Kangaroo (321 x 481), Train (321 x 481), Horses (321 x 481), Kodak05 (512 x 768), Kodak20 (512x768), 
Girl (258 x 255), and Flower (171 x 330). 



V. Inpainting 

In the addressed case of inpainting, the original image f is masked with a random mask, y = Uf, where U 
is a diagonal matrix whose diagonal entries are randomly either 1 or 0, keeping or killing the corresponding 
pixels. Note that this can be considered as a particular case of compressed sensing, or when collectively 
considering all the image patches, as matrix completion (and as here demonstrated, in contrast with the 
recent literature on the subject, a single subspace is not sufficient, see also ff\\ ). 
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The experiments are performed on the gray-level images Lena, Barbara, House, and Boat, and the color 
images Castle, Mushroom, Train and Horses. Uniform random masks that retain 80%, 50%, 30% and 20% 
of the pixels are used. The masked images are then inpainted with the algorithms under consideration. 

For gray-level images, the image patch size is \/N x v^V = 8x8 when the available data is 80%, 50%, 
and 30%. Larger patches of size 12 x 12 are used when images are heavily masked with only 20% pixels 
available. For color images, patches of size y/N x \/N x 3 throughout the RGB color channels are used to 
exploit the redundancy among the channels ll43l . To simplify the initialization in color image processing, 
the E-step in the first iteration is calculated with "gray-level" patches of size y/N x y/N on each channel, but 
with a unified model selection across the channels: The same model selection is performed throughout the 
channels by minimizing the sum of the model selection energy ([7]) over all the channels; the signal estimation 
is calculated in each channel separately. The M-step then estimates the Gaussian models with the "color" 
patches of size y/N x y/N x 3 based on the model selection and the signal estimate previously obtained in 
the E-step. Starting from the second iteration, both the E- and M-steps are calculated with "color" patches, 
treating the \/~N x \/~N x 3 patches as vectors of size 3N. y/N is set to 6 for color images, as in the previous 
works 11431 . ifTTTl . The MAP-EM algorithm runs for 5 iterations. The noise standard deviation a is set to 
3, which corresponds to the typical noise level in these images. The small constant e in the covariance 
regularization is set to 30 in all the experiments. 

The PLE inpainting is compared with a number of recent methods, including "MCA" (morphological 
component analysis) [24], "ASR" (adaptive sparse reconstructions) [33] , "ECM" (expectation conditional 
maximization) (271 , "KR" (kernel regression) |50), "FOE" (fields of experts) [56], "BP" (beta process) fTTTl . 
and "K-SVD" ll43l . MCA and ECM compute the sparse inverse problem estimate in a dictionary that 
combines a curvelet frame ifTTI . a wavelet frame (45l and a local DCT basis. ASR calculates the sparse 
estimate with a local DCT. BP infers a nonparametric Bayesian model from the image under test (noise 
level is automatically estimated). Using a natural image training set, FOE and K-SVD learn respectively a 
Markov random field model and an overcomplete dictionary that gives sparse representation for the images. 
The results of MCA, ECM, KR, FOE are generated by the original authors' softwares, with the parameters 
manually optimized, and those of ASR are calculated with our own implementation. The PSNRs of BP 
and K-SVD are cited from the corresponding papers. K-SVD and BP currently generate the best inpainting 
results in the literature. 

Table HJ-left gives the inpainting results on gray-level images. PLE considerably outperforms the other 
methods in all the cases, with a PSNR improvement of about 2 dB on average over the second best algorithms 
(BP, FOE and MCA). With 20% available data on Barbara, which is rich in textures, it gains as much as 
about 3 dB over MCA, 4 dB over ECM and 6 dB over all the other methods. Let us remark that when the 
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missing data ratio is high, MCA generates quite good results, as it benefits from the curvelet atoms that 
have large support relatively to the local patches used by the other methods. 

Figure [TO] compares the results of different algorithms. All the methods lead to good inpainting results 
on the smooth regions. MCA and KR are good at capturing contour structures. However, when the curvelet 
atoms are not correctly selected, MCA produces noticeable elongated curvelet-like artifacts that degrade 
the visual quality and offset its gain in PSRN (see for example the face of Barbara). MCA restores better 
textures than BP, ASR, FOE and KR. PLE leads to accurate restoration on both the directional structures 
and the textures, producing the best visual quality with the highest PSNRs. An additional PLE inpainting 
examples is shown in Figure [7] 



Data ratio 
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ASR 


ECM 
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FOE* 


BP 
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42.18 
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33.18 
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29.04 
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28.19 


28.53 


28.43 


31.21 
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36.45 
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30% 
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Mushroom 


80% 
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35.36 


20% 


31.56 


32.06 


Train 


80% 


40.73 


44.01 


50% 


32.00 


32.75 
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27.00 


27.46 


20% 


24.59 


24.73 


Horses 


80% 


41.97 


48.83 


50% 


37.27 


38.52 


30% 


32.52 


32.99 


20% 


29.99 


30.26 


Average 


80% 


41.69 


47.59 


50% 


36.15 


37.58 


30% 


31.54 


32.18 


20% 


28.81 


29.28 



TABLE I 

PSNR COMPARISON ON GRAY-LEVEL (LEFT) AND COLOR (RIGHT) IMAGE INPAINTING. FOR EACH IMAGE, 
UNIFORM RANDOM MASKS WITH FOUR AVAILABLE DATA RATIOS ARE TESTED. THE ALGORITHMS UNDER 
CONSIDERATION ARE MCA (241, ASR E3 , ECM §2J\ , KR (601, FOE J561, BP Jill, AND THE PROPOSED PLE 
FRAMEWORK. THE BOTTOM BOX SHOWS THE AVERAGE PSNRS GIVEN BY EACH METHOD OVER ALL THE IMAGES 
AT EACH AVAILABLE DATA RATIO. THE HIGHEST PSNR IN EACH ROW IS IN BOLDFACE. THE ALGORITHMS WITH * 

USE A TRAINING DATASET. 



Table U-right compares the PSNRs of the PLE color image inpainting results with those of BP (the only 
one in the literature that reports the comprehensive comparison in our knowledge). Again, PLE generates 
higher PSNRs in all the cases. While the gain is especially large, at about 6 dB, when the available data 
ratio is high (at 80%), for the other masking rates, it is mostly between 0.5 and 1 dB. Both methods use 
only the image under test to learn the dictionaries. 

Figure [JJ] illustrates the PLE inpainting result on Castle with 20% available data. Calculated with a much 
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(a) Original (b) Masked (c) MCA (24.18 dB) (d) ASR (21.84 dB) 




(e) KR (21.55 dB) (f) FOE (21.92 dB) (g) BP (25.54 dB) (h) PLE (27.65 dB) 

Fig. 10. Gray-level image inpainting. (a) Original image cropped from Barbara, (b) Masked image with 20% available 
data (6.81 dB). From (c) to (g): Image inpainted by different algorithms. Note the overall superior visual quality 
obtained with the proposed approach. The PSNRs are calculated on the cropped images. 

reduced computational complexity, the resulting 30.07 dB PSNR surpasses the highest PSNR, 29.65 dB, 
reported in the literature, produced by K-SVD l43l . that uses a dictionary learned from a natural image 
training set, followed by 29.12 dB given by BP (BP has been recently improved adding spatial coherence 
in the code, unpublished results). As shown in the zoomed region, PLE accurately restores the details of the 
castle from the heavily masked image. Let us remark that inpainting with random masks on color images is 
in general more favorable than on gray-level images, thanks to the information redundancy among the color 
channels. 




(a) Original (b) Masked (c) PLE 



Fig. 11. Color image inpainting. (a) Original image cropped from Castle, (b) Masked image with 20% available data 
(5.44 dB). (c) Image inpainted by PLE (27.30 dB). The PSNR on the overall image obtained with PLE is 30.07 dB, 
higher than the best result reported so far in the literature 29.65 dB [43]. 

VI. Interpolation zooming 

Interpolation zooming is a special case of inpainting with regular subsampling on uniform grids. As 
explained in Section IIII-AI the regular subsampling operator U may result in a highly coherent transformed 
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dictionary UD. Calculating an accurate sparse estimation for interpolation zooming is therefore more difficult 
than that for inpainting with random masks. 

The experiments are performed on the gray-level images Lena, Peppers, Mandril, Cameraman, Boat, and 
Straws, and the color images Lena, Peppers, Kodak05 and Kokad20. The color images are treated in the 
same way as for inpainting. These high-resolution images are down-sampled by a factor 2x2 without 
anti-aliasing filtering. The resulting low-resolution images are aliased, which corresponds to the reality of 
television images that are usually aliased, since this improves their visual perception. The low-resolution 
images are then zoomed by the algorithms under consideration. When the anti-aliasing blurring operator is 
included before subsampling, zooming can be casted as a deconvolution problem and will be addressed in 
Section IVlTl 

The PLE interpolation zooming is compared with linear interpolators (H, 1371 , |63l , |[52ll as well as recent 
super-resolution algorithms "NEDI" (new edge directed interpolation) l39l , "DFDF" (directional filtering 
and data fusion) [68], "KR" (kernel regression) |[60l . "ECM" (expectation conditional maximization) E71 . 
"Contourlet" |51], "ASR" (adaptive sparse reconstructions) (33, "FOE" (fields of experts) [56]], "SR" 
(sparse representation) |[65l . "SAI" (soft-decision adaptive Interpolation) f69l and "SME" (sparse mixing 
estimators) B71 . KR, ECM, ASR and FOE are generic inpainting algorithms that have been described in 
Section [V] NEDI, DFDF and SAI are adaptive directional interpolation methods that take advantage of the 
image directional regularity. Contourlet is a sparse inverse problem estimator as described in Section IIII-AI 
computed in a contourlet frame. SR is also a sparse inverse estimator that learns the dictionaries from 
a training image set. SME is a recent zooming algorithm that exploits directional structured sparsity in 
wavelet representations. Among the previously published algorithms, SAI and SME currently provide the 
best PSNR for spatial image interpolation zooming fiTl . [69]. The results of ASR are generated with our own 
implementation, and those of all the other algorithms are produced by the original authors' softwares, with 
the parameters manually optimized. As the anti-aliasing operator is not included in the interpolation zooming 
model, to obtain correct results with SR, the anti-aliasing filter used in the original authors' SR software is 
deactivated in both dictionary training (with the authors' original training dataset of 92 images) and super- 
resolution estimation. PLE is configured in the same way as for inpainting as described in Section [V] with 
patch size 8 x 8 for gray-level images, and 6 x 6 x 3 for color images. 

Table [II] gives the PSNRs generated by all algorithms on the gray-level and the color images. Bicubic 
interpolation provides nearly the best results among all tested linear interpolators, including cubic splines (63), 
MOMS [8] and others [52], due to the aliasing produced by the down-sampling. PLE gives moderately higher 
PSNRs than SME and SAI for all the images, with one exception where the SAI produces slightly higher 
PSNR. Their gain in PSNR is significantly larger than with all the other algorithms. 
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Figure [T2]compares an interpolated image obtained by the baseline bicubic interpolation and the algorithms 
that generate the highest PSNRs, SAI and PLE. The local PSNRs on the cropped images produced by all 
the methods under consideration are reported as well. Bicubic interpolation produces some blur and jaggy 
artifacts in the zoomed images. These artifacts are reduced to some extent by the NEDI, DFDF, KR and FOE 
algorithms, but the image quality is still lower than with PLE, SAI and SME algorithms, as also reflected in 
the PSNRs. SR yields an image that looks sharp. However, due to the coherence of the transformed dictionary, 
as explained in Section IIII-AI when the approximating atoms are not correctly selected, it produces artifact 
patterns along the contours, which degrade its PSNR. The PLE algorithm restores slightly better than SAI 
and SME on regular geometrical structures, as can be observed on the upper and lower propellers, as well 



as on the fine lines on the side of the plane indicated by the arrows. 





Bicubic 


NEDI 


DFDF 


KR 


ECM 


Contourlet 


ASR 


FOE* 


SAI 


SME 


PLE 


Lena 


33.93 


33.77 


33.91 


33.94 


24.31 


33.92 


33.19 


34.04 


34.68 


34.58 


34.76 


Peppers 


32.83 


33.00 


33.18 


33.15 


23.60 


33.10 


32.33 


31.90 


33.52 


33.52 


33.62 


Mandril 


22.92 


23.16 


22.83 


22.93 


20.34 


22.53 


22.66 


22.99 


23.19 


23.16 


23.27 


Cameraman 


25.37 


25.42 


25.67 


25.51 


19.50 


25.35 


25.33 


25.58 


25.88 


26.26 


26.47 


Boat 


29.24 


29.19 


29.32 


29.18 


22.20 


29.25 


28.96 


29.36 


29.68 


29.76 


29.93 


Straws 


20.53 


20.54 


20.70 


20.76 


17.09 


20.52 


20.54 


20.47 


21.48 


21.61 


21.82 


Ave. gain 





0.04 


0.13 


0.11 


-6.30 


-0.02 


-0.30 


-0.08 


0.60 


0.68 


0.84 





Bicubic 


NEDI 


DFDF 


KR 


FOE* 


SR* 


SAI 


SME 


PLE 


Lena 


32.41 


32.47 


32.46 


32.55 


32.55 


26.42 


32.98 


32.88 


33.53 


Peppers 


30.95 


31.06 


31.24 


31.26 


31.05 


26.43 


31.37 


31.35 


31.88 


Kodak05 


25.82 


25.93 


26.03 


26.09 


26.01 


20.76 


26.91 


26.72 


26.77 


Kodak20 


30.65 


31.06 


31.08 


30.97 


30.84 


25.92 


31.51 


31.38 


31.72 


Ave. gain 





0.17 


0.25 


0.27 


0.16 


-5.07 


0.74 


0.63 


1.02 



TABLE II 

PSNR COMPARISON ON GRAY-LEVEL (TOP) AND COLOR (BOTTOM) IMAGE INTERPOLATION ZOOMING. THE 
ALGORITHMS UNDER CONSIDERATION ARE BICUBIC INTERPOLATION, NEDI [39], DFDF ll68ll . KR ll60l . 
ECM (37), Contourlet Efl, ASR fl33j, FOE 115611, SR Ii65i SAI H59H , SME E7J and the proposed PLE 

FRAMEWORK. THE BOTTOM ROW SHOWS THE AVERAGE GAIN OF EACH METHOD RELATIVE TO THE BICUBIC 
INTERPOLATION. THE HIGHEST PSNR IN EACH ROW IS IN BOLDFACE. THE ALGORITHMS WITH * USE A TRAINING 

DATASET. 



VII. Deblurring 

An image f is blurred and contaminated by additive noise, y = Uf + w, where U is a convolution operator 
and w is the noise. Image deblurring aims at estimating f from the blurred and noisy observation y. 

A. Hierarchical PLE 

As explained in Section IIII-C21 the recoverability condition of sparse super-resolution estimates for 
deblurring requires a dictionary comprising atoms with spread Fourier spectrum and thus localized in space, 
such as the position PCA basis illustrated in Figure 01 To reduce the computational complexity, model 
selection with a hierarchy of directional PCA bases and position PCA bases is proposed, in the same spirit 
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(a) HR (b) LR (c) Bibubic (d) SAI (e) PLE 

Fig. 12. Color image zooming, (a) Crop from the high-resolution image Kodak20. (b) Low-resolution image. From (c) 
to (e), images zoomed by bicubic interpolation (28.48 dB), SAI (30.32 dB) [69 1, and proposed PLE framework (30.64 
dB). PSNRs obtained by the other methods under consideration: NEDI (29.68 dB) (39), DFDF (29.41 dB) (68), KR 
(29.49 dB) El, FOE (28.73 dB) ED, SR (23.85 dB) [65), and SME (29.90 dB) (47). Attention should be focused 
on the places indicated by the arrows. 

of 1166*1 . Figure [T3l(a) illustrates the hierarchical PLE with a cascade of the two layers of model selections. 
The first layer selects the direction, and given the direction, the second layer further specifies the position. 

In the first layer, the model selection procedure is identical to that in image inpainting and zooming, i.e., it 
is calculated with the Gaussian models corresponding to the directional PCA bases {Bt}i</t<^, Figure [2]-(c). 
In this layer, a directional PCA of orientation 6 is selected for each patch. Given the directional basis 
selected in the first layer, the second layer recalculates the model selection (O, this time with a family of 
position PCA bases {Bk,p}i<p<p corresponding to the same direction 6 as the directional basis selected 
in the first layer, with atoms in each basis B^.p localized at one position, and the P bases translating in 
space and covering the whole patch. The image patch estimation ([8]) is obtained in the second layer. This 
hierarchical calculation reduces the computational complexity from 0(KP) to 0(K + P). 








1 

1 


ft 




\ Q\0 



(a) (b) 

Fig. 13. (a). Hierarchical PLE for deblurring. Each patch in the first layer symbolizes a directional PCA basis. Each 
patch in the second layer symbolizes a position PCA basis, (b) To circumvent boundary issues, deblurring a patch 
whose support is £2 can be casted as inverting an operator compounded by a masking and a convolution defined on a 
larger support £2. See text for details. 

For deblurring, boundary issues on the patches need to be addressed. Since the convolution operator is non- 
diagonal, the deconvolution of each pixel y(x) in the blurred image y involves the pixels in a neighborhood 
around x whose size depends on the blurring kernel. As the patch based methods deal with the local patches, 
for a given patch, the information outside of it is missing. Therefore, it is impossible to obtain accurate 
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deconvolution estimation on the boundaries of the patches. To circumvent this boundary problem, a larger 
patch is considered and the deconvolution is casted as a deconvolution plus an inpainting problem. Let us 
retake the notations f,, y,- and w,- to denote respectively the patches of size y/N x y/N in the original image 
f, the degraded image y, and the noise w. Let Q. be their support. Let f/, y, and w, be the corresponding 
larger patches of size (\/N + 2r) x (v^V + 2r), whose support D. is centered at the same position as Q. and 
with an extended boundary of width r (the width of the blurring kernel, see below), as illustrated in 
Figure [TSKb). Let U be an extension of the convolution operator U on A such that Uf, (x) = Uf, (x) if x € Si, 
and if x S Cl\Q.. Let M be a masking operator defined on D. which keeps all the pixels in the central part 
Q. and kills the rest, i.e., Mf;(x) = fj(x) if x 6 £1, and if x G D\D.. If the width r of the boundary D\Q. is 
larger than the radius of the blurring kernel, one can show that the blurring operation can be rewritten locally 
as an extended convolution on the larger support followed by a masking, My, = MUf,- + Mw,. Estimating f, 
from y ( - can thus be calculated by estimating f, from My,, following exactly the same algorithm, now treating 
the compounded MU as the degradation operator to be inverted. The boundary pixels in the estimate fj(:c), 
x € can be interpreted as an extrapolation from y ( , therefore less reliable. The deblurring estimate f, 

is obtained by discarding these boundary pixels from f, (which are outside of D. anyway). 

Local patch based deconvolution algorithms become less accurate if the blurring kernel support is large 
relative to the patch size. In the deconvolution experiments reported below, £1 and Q. are respectively set to 
8x8 and 12 x 12. The blurring kernels are restricted to a 5 x 5 support. 

B. Deblurring Experiments 

The deblurring experiments are performed on the gray-level images Lena, Barbara, Boat, House, and 
Cameraman, with different amounts of blur and noise. The PLE deblurring is compared with a number 
of deconvolution algorithms: "ForWaRD" (Fourier-wavelet regularized deconvolution) ll53l . "TVB" (total 
variation based) [7], "TwIST" (two-step iterative shrinkage/thresholding) |6], "SP" (sparse prior) (38), "SA- 
DCT" (shape adaptive DCT) (281, "BM3D" (3D transform-domain collaborative filtering) 118], and "DSD" 
(direction sparse deconvolution) BT1 . ForWaRD, SA-DCT and BM3D first calculate the deconvolution with 
a regularized Wiener filter in Fourier, and then denoise the Wiener estimate with, respectively, a thresholding 
estimator in wavelet and SA-DCT representations, and with the non-local 3D collaborative filtering flTTl . 
TVB and TwIST deconvolutions regularize the estimate with the image total variation prior. SP assumes a 
sparse prior on the image gradient. DSD is a recently developed sparse inverse problem estimator, described 
in Section MI- A I In the previous published works, BM3D and SA-DCT are among the deblurring methods 
that produce the highest PSNRs, followed by SP. The results of TVB, TwIST, SP, SA-DCT and DSD are 
generated by the authors' original softwares, with the parameters manually optimized, and those of ForWaRD 
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are calculated with our own implementation. The proposed algorithm runs for 5 iterations. 

Table [III] gives the ISNRs (improvement in PSNR relative to the input image) of the different algorithms 
for restoring images blurred with Gaussian kernels of standard deviation at, = 1 and 2 (truncated to a 5 x 5 
support), and contaminated by a white Gaussian noise of standard deviation a n = 5. BM3D produces the 
highest ISNRs, followed closely by SA-DCT and PLE, whose ISNRs are comparable and are moderately 
higher than with SP on average. Let us remark that BM3D and SA-DCT apply an empirical Wiener filtering 
as a post-processing that boosts the ISNR by near 1 dB. The empirical Wiener technique can be plugged 
into other sparse transform-based methods such as PLE and ForWaRD as well. Without this post-processing, 
PLE outperforms BM3D and SA-DCT on average. 

Figure [14] shows a deblurring example. All the algorithms under consideration reduce the amount of 
blur and attenuate the noise. BM3D generates the highest ISNR, followed by SA-DCT, PLE and SP, all 
producing similar visual quality, which are moderately better than the other methods. DSD accurately restores 
sharp image structures when the atoms are correctly selected, however, some artifacts due to the incorrect 
atom selection offset its gain in ISNR. The empirical Wiener filtering post-processing in BM3D and SA- 
DCT efficiently removes some artifacts and significantly improves the visual quality and the ISNR. More 
examples of PLE deblurring will be shown in the next section. 



Kernel size and input PSNR 


ForWaRD 


TVB 


TwIST 


SA-DCT 


BM3D 


SP 


DSD* 


PLE 


Lena 


a b = 1 


30.62 


2.51 


3.03 


2.87 


3.56/2.58 


4.03/3.45 


3.31 


2.56 


3.77 


O b = 2 


28.84 


2.33 


3.15 


3.13 


3.46/3.00 


3.91/3.20 


3.40 


2.47 


3.52 


House 


o b = 1 


30.04 


2.31 


3.12 


3.23 


4.14/3.07 


4.29/3.80 


3.52 


2.27 


4.38 


o b = 2 


28.02 


2.29 


3.24 


3.82 


4.21/3.64 


4.73/4.11 


3.92 


2.97 


3.90 


Boat 


o b = 1 


28.29 


1.69 


2.45 


2.44 


2.93/2.21 


3.23/2.46 


2.70 


1.93 


2.72 


o b = 2 


26.21 


1.63 


2.67 


2.59 


3.71/2.63 


3.33/2.44 


2.60 


2.02 


2.48 


Average 


o b = 1 


29.65 


2.17 


2.87 


2.84 


3.54/2.62 


3.85/3.23 


3.17 


2.25 


3.62 


o b = 2 


27.69 


2.08 


3.02 


3.18 


3.79/3.09 


3.99/3.25 


3.30 


2.48 


3.31 



TABLE III 

ISNR (IMPROVEMENT IN PSNR WITH RESPECT TO INPUT IMAGE) COMPARISON ON IMAGE DEBLURRING. IMAGES 
ARE BLURRED BY A GAUSSIAN KERNEL OF STANDARD DEVIATION Ob = 1 AND 2, AND ARE THEN CONTAMINATED 

by white Gaussian noise of standard deviation a n = 5. From left to right: ForWaRD [53], TVB Q, 
TwIST J6l, SA-DCT (with/without empirical Wiener post-processing) IT28TI , BM3D (with/without 
empirical Wiener post-processing) lfT8ll . SP IT3811 . DSD PTTl . and the proposed PLE framework. The 

BOTTOM BOX SHOWS THE AVERAGE ISNRS GIVEN BY EACH METHOD OVER ALL THE IMAGES WITH DIFFERENT 
AMOUNTS OF BLUR. THE HIGHEST ISNR IN EACH ROW IS IN BOLDFACE, WHILE THE HIGHEST WITHOUT 
POST-PROCESSING IS IN ITALIC. THE ALGORITHMS WITH * USE A TRAINING DATASET. 



C. Zooming deblurring 

When an anti-aliasing filtering is taken into account, image zooming-out can be formulated as y = SUf, 
where f is the high-resolution image, U and S are respectively an anti-aliasing convolution and a subsampling 
operator, and y is the resulting low-resolution image. Image zooming aims at estimating f from y, which 
amounts to inverting the combination of the two operators S and U. 
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(a) Original (b) Blurred and noisy (c) BM3D (D) PLE 



Fig. 14. Gray-level image deblurring. (a) Crop from Lena, (b) Image blurred by a Gaussian kernel of standard deviation 
Gb = 1 and contaminated by white Gaussian noise of standard deviation a„ = 5 (PSNR=27.10). (c) and (d). Images 
deblurred by BM3D with empirical Wiener post-processing (ISNR 3.40 dB dB) lfl8l . and the proposed PLE framework 
(ISNR 2.94 dB). ISNR produced by the other methods under consideration: BM3D without empirical Wiener post- 
processing (2.65 dB) flU, TVB (2.72 dB) Q, TwIST (2.61dB) (6), SP (2.93 dB) (38), SA-DCT with/without empirical 
Wiener post-processing (2.95/2.10 dB) ED, and DSD (1.95 dB) ED- 

Image zooming can be calculated differently under different amounts of blur introduced by U. Let us 
distinguish between three cases: (i) If the anti-aliasing filtering U removes enough high-frequencies from f 
so that y = SUf is free of aliasing, then the subsampling operator S can be perfectly inverted with a linear 
interpolation denoted as I, i.e., IS =Id P31 . In this case, zooming can can be calculated as a deconvolution 
problem on Iy = Uf, where one seeks to invert the convolution operator U. In reality, however, camera and 
television images contain, always a certain amount of aliasing, since it improves the visual perception, i.e., 
the anti-aliasing filtering U does not eliminate all the high-frequencies from f. (ii) When U removes a small 
amount of high-frequencies, which is often the case in reality, zooming can be casted as an interpolation 
problem (39) > 03, ED, ED, E3, E3, where one seeks to invert only S, as addressed in Section LYU 
(iii) When U removes an intermediate amount of blur from f, the optimal zooming solution is inverting SU 
together as a compounded operator, as investigated in |[65l . 

This section introduces a possible solution for the case (iii) with the PLE deblurring. A linear interpolation 
I is first applied to partially invert the subsampling operator S. Due to the aliasing, the linear interpolation 
does not perfectly restore Uf, nevertheless it remains rather accurate, i.e., the interpolated image Iy = ISUf 
is close to the blurred image Uf, as Uf has limited high-frequencies in the case (iii). The PLE deblurring 
framework is then applied to deconvolve U from Iy. Inverting the operator U is simpler than inverting the 
compounded operator SU. As the linear interpolation I in the first step is accurate enough in the case (iii), 
deconvolving Iy results in accurate zooming estimates. 

In the experiments below, the anti-aliasing filter U is set as a Gaussian convolution of standard deviation 
Go = 1 and Sisanxxi = 2x2 subsampling operator. It has been shown that a pre-filtering with a 
Gaussian kernel of Oq = 0.8s guarantees that the following jxj subsampling generates a quasi aliasing-free 
image ll50l . For a 2 x 2 subsampling, the anti-aliasing filtering U with Oq = 1.0 leads to an amount of 
aliasing and visual quality similar to that in common camera pictures in reality. 

Figure [15] illustrates an experiment on the image Lena. Figures [T5T(a) to (c) show, respectively, a crop of 
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(a) f (b) Uf (c) y = SUf (d) Iy (e) PLE (f) SR 

Fig. 15. Color image zooming deblurring. (a) Crop from Lena: f. (b) Image pre-filtered with a Gaussian kernel of 
standard deviation Oq = 1-0: Uf. (c) Image subsampled from Uf by a factor of 2 x 2: y = SUf. (d) Image interpolated 
from y with a cubic spline interpolation: Iy (31.03 dB). (d) Image deblurred from Iy by the proposed PLE framework 
(34.27 dB). (e) Image zoomed from y with f65| (29.66 dB). The PSNRs are calculated on the cropped image between 
the original f and the one under evaluation. 

the original image f, the pre-filtered version Uf, and the low-resolution image after subsampling y = SUf. 
As the amount of aliasing is limited in y thanks to the anti-aliasing filtering, a cubic spline interpolation is 
more accurate than lower ordered interpolations such as bicubic [63], and is therefore applied to upsample 
y, the resulting image Iy illustrated in Figure [T5T(d). A visual inspection confirms that Iy is very close to 
Uf, the PSNR between them being as high as 50.02 dB. The PLE deblurring is then applied to calculate the 
final zooming estimate f by deconvolving U from Iy. (As no noise is added after the anti-aliasing filter, the 
noise standard deviation is set to a small value a = 1.) As illustrated in Figure [T5T(e). the resulting image 
f is much sharper, without noticeable artifacts, and improves by 3.12 dB with respect to Iy. Figure [T5T(f) 
shows the result obtained with "SR" (sparse representation) [65]. SR implements a sparse inverse problem 
estimator that tries to invert the compounded operator SU, with a dictionary learned from a natural image 
dataset. The experiments were performed with the authors' original software and training image set. The 
dictionaries were retrained with the US described above. It can be observed that the resulting image looks 
sharper and the restoration is accurate when the atoms selection is correct. However, due to the coherence of 
the dictionaries as explained in Section IIII-AI some noticeable artifacts along the edges are produced when 
the atoms are incorrectly selected, which also offset its gain in PSNR. 

Figure [16] shows another set of experiments on the image Girl. Again PLE efficiently reduces the blur 
from the interpolated image and leads to a sharp zoomed image without noticeable artifacts. SR produces 
similarly good visual quality as PLE, however, some slight but noticeable artifacts (near the end of the nose 
for example) due to the incorrect atom selection offset its gain in PSNR. 

Table [IV] gives the PSNRs comparison on the color image images Lena, Girl and Flower. PLE deblurring 
from the cubic spline interpolation improves from 1 to 2 dB PSNR over the interpolated images. Although 
SR is able to restore sharp images, its gain in PSNR is offset by the noticeable artifacts. 
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(a) HR (b) LR (c) Cubic spline (c) PLE (d) SR 



Fig. 16. Color image zooming deblurring. (a) Crop from Girl: f. (b) Image pre-filtered with a Gaussian kernel of 
standard deviation Go = 1-0, and subsampled by a factor of 2 x 2: y = SUf. (c) Image interpolated from y with a cubic 
spline interpolation: Iy (29.40 dB). (d) Image deblurred from Iy by the proposed PLE framework (30.49 dB). (e) Image 
zoomed from y with (HI (28.93 dB). 





Cubic spline 


SR* 


PLE 


Lena 


31.60 


30.64 


33.78 


Girl 


30.62 


30.43 


31.82 


Flower 


37.02 


35.96 


39.06 



TABLE IV 

PSNR COMPARISON IN IMAGE ZOOMING. THE HIGH-RESOLUTION IMAGES ARE BLURRED AND SUBSAMPLED TO 
GENERATE THE LOW-RESOLUTION IMAGES. THE FIRST COLUMN SHOWS THE PSNR PRODUCED BY CUBIC SPLINE 
INTERPOLATION. PLE DEBLURRING IS APPLIED OVER THE INTERPOLATED IMAGES AND THE RESULTING PSNRS 
ARE SHOWN IN THE THIRD COLUMN. THE SECOND COLUMN GIVES THE PSNR OBTAINED WITH SR ll65l . THE 
HIGHEST PSNR IN EACH ROW IS IN BOLDFACE. THE ALGORITHMS WITH * USE A TRAINING DATASET. 



VIII. Conclusion and future works 

This work has shown that Gaussian mixture models (GMM) and a MAP-EM algorithm provide general 
and computational efficient solutions for inverse problems, leading to results in the same ballpark as state- 
of-the-art ones in various image inverse problems. A dual mathematical interpretation of the framework with 
structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes 
and improves the traditional sparse inverse problem approach. This connection also suggests an effective 
dictionary motivated initialization for the MAP-EM algorithm. In a number of image restoration applications, 
including inpainting, zooming, and deblurring, the same simple and computationally efficient algorithm (its 
core, (0]), ([5]), ([7]) and ([8]), can be written in 4 lines Matlab code) produce either equal, often significantly 
better, or very small margin worse results than the best published ones, with a computational complexity 
typically one or two magnitude smaller than l\ sparse estimations. Similar results (on average just 0.1 dB 
lower than BM3D ifTTl ) are obtained for the simpler problem of denoising (U the identity matrix). 

As described in Section [TTJ the proposed algorithm is calculated with classic statistical tools of MAP-EM 
clustering and empirical covariance estimation. As a possible future work, its performance may be further 
improved with more sophisticated statistical instruments, for example, the stochastic EM algorithms [13] 
and more advanced covariance regularization methods B71 . at a cost of higher computational complexity. 
Acknowledgements: Work supported by NSF, NGA, ONR, ARO and NSSEFF. We thank Stephanie Allassonniere for 
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helpful discussions, in particular about MAP-EM and covariance regularization. 
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